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Gene expression levels are important molecular quantitative traits that link 
genotypes to molecular functions and fitness. In Drosophila, population-genetic 
studies in recent years have revealed substantial adaptive evolution at the ge¬ 
nomic level. However, the evolutionary modes of gene expression have remained 
controversial. Here we present evidence that adaptation dominates the evolution 
of gene expression levels in flies. We show that 63% of the observed expression 
divergence across seven Drosophila species are adaptive changes driven by direc¬ 
tional selection. Our results are derived from the variation of expression within 
species and the time-resolved divergence across a family of related species, us¬ 
ing a new inference method for selection. Adaptive gene expression is stronger 
in specific functional classes, which include regulation, sensory perception, sex¬ 
ual behavior, and morphology. Adaptation increases with broader codon usage 
and, consistently, highly expressed genes contribute less to adaptation. While 
most genes show coherent evolution in both sexes, we identify a large group 
of genes with sex-specific adaptation of expression, which predominantly occurs 
in males. Our analysis opens a new avenue to map system-wide selection on 
molecular quantitative traits independently of their genetic basis. 
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In recent years, several studies have found evidence for widespread adaptive evolution 
of the Drosophila genome [1-3]. This includes adaptive changes in non-coding sequence, 
consistent with classical ideas on the importance of regulatory evolution for phenotypic 
adaptation [4], Gene expression levels are important molecular phenotypes that quantify 
the effects of regulation on organismic traits and htness. Insights on how genome evolution 
affects gene expression have come from studies of quantitative trait loci (QTL); see refs. [5-7] 
for recent reviews. In yeast, at least 10% of the genes have been inferred to undergo adaptive 
evolution of expression [8]. In flies, expression-QTL analysis has been used to estimate cis- 
and trans- effects on expression [9,10] and to compare the evolution of expression and that 
of the underlying regulatory sequence [11]; related studies have been performed in yeast [12]. 
However, given the complexity of the regulatory genotype-to-phenotype map and the limited 
sensitivity of QTL studies, our understanding of how adaptive genome changes relate to 
mRNA and protein levels has remained incomplete [5,7,13]. 

An alternative approach is to analyze the evolution of gene expression by methods of 
quantitative genetics, without explicit reference to genetic evolution of the QTL [6,7,14-21]. 
These studies compare the expression divergence across species, the variation within species, 
and the expected behavior for neutral evolution [22]. A broad picture of evolutionary con¬ 
straint on gene expression levels caused by stabilizing selection has emerged in a number of 
species, including Drosophila [6,14,15,17,20]. A comparative study between human and chim¬ 
panzee has produced signatures of predominantly neutral evolution of gene expression [16]. 
Other studies in primates have identihed stabilizing selection, as well as lineage- and tissue- 
specihc directional expression changes [6,17,19,21,23-25]. However, it has remained difficult 
to demonstrate that positive selection, as opposed to relaxed stabilizing selection, is the evo¬ 
lutionary cause of expression divergence [5]. Thus, estimating the genome-wide contribution 
of adaptation to the evolution of gene expression is an outstanding problem. 

In this paper, we show that adaptation is the prevalent evolutionary mode of gene expres¬ 
sion throughout the Drosophila genus. Our method is based on recent theoretical results on 
the evolution of molecular quantitative traits [26-28] and has two essential features. First, 
we infer adaptation driven by directional selection together with conservation under stabi¬ 
lizing selection, which allows us to discriminate these two modes in the evolution of gene 
expression. Second, we identify observables that decouple from number and effects of the 
underlying QTL. These molecular determinants of gene expression are often unknown, vary 
considerably between genes, and confound a naive phenotype-based inference of selection. 

Results 

The pattern of gene expression divergence 

We use gene expression data from samples of inbred male and female individuals [19], which 
cover 6332 orthologous genes in seven Drosophila species. A phylogenetic tree of these species 
is shown in Fig. SI. Gene expression levels are dehned in a standard way as logarithms of 
mRNA counts, suitably normalized to account for differences in assay sensitivity between 
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experimental probes (Materials and Methods). Within each population, we use these data to 
estimate the mean expression level of a gene, its total heritable expression variance (referred 
to as expression diversity), and its non-heritable expression variance between inbred individ¬ 
uals. For each pair of populations, we obtain the cross-species expression divergence of a gene 
as the squared difference between the population mean levels (see Materials and Methods 
for details). These quantities dehne the (suitably scaled) divergence-diversity ratio hi, which 
plays a key role in our evolutionary analysis (Box 1 and Materials and Methods). For a given 
gene, the evolution of the hi ratio depends only weakly on the effect distribution of its ex¬ 
pression QTL and on the amount of recombination between these loci. For neutral evolution, 
this property is implicitly contained in classical quantitative genetics approaches [22,29], but 
it holds more generally under stabilizing and directional selection [26-28]. 

To obtain a genome-wide evolutionary picture of gene expression in Drosophila, we eval¬ 
uate the aggregate divergence-diversity ratio for all genes in our data set (Materials and 
Methods). Grouping the species into 6 clades, we obtain a consistent pattern of the G ratio 
as a function of divergence time, r (Fig. lA). These data show macro-evolution of expression 
levels: already the average expression divergence for the youngest clade, D. melanogaster 
and D. simulans, is by a factor 2 larger than the average diversity within species (Fig. SI). 
On the other hand, the observed O values for all clades are substantially smaller than the 
expected values under neutral evolution, which can be computed analytically [26]. This 
constraint suggests that gene expression evolves under stabilizing selection, in agreement 
with previous studies [14,15,20] and with a standard Qst/Fst analysis [29] (Materials and 
Methods). Importantly, however, the O ratio increases with divergence time throughout the 
Drosophila genus and does not show evidence of saturation for larger values of r, in accor¬ 
dance with a similar pattern of the expression divergence observed previously [19]. In the 
following, we will show that this feature reflects adaptive evolution of gene expression. 

Fitness model for gene expression 

The inference of adaptation is based on a minimal dynamical model of selection: gene 
expression levels evolve in a single-peak fitness seascape [26,28]. This model is illustrated 
in Box 1 and formally defined in Materials and Methods. The htness peak for a given gene 
performs a random walk over macro-evolutionary periods. This walk maps continual changes 
of the optimal expression of that gene, which are generated by long-term environmental 
shifts and epistatic co-evolution with other genes. Despite its simplicity, the seascape model 
combines two salient features of selection on gene expression: stabilizing selection generates 
evolutionary constraint, and directional selection drives long-term adaptive changes. These 
selection components are measured by two parameters, the stabilizing strength c and the 
driving rate v. 

Here we use the dependence of the divergence-diversity ratio on evolutionary time, D(r), 
to infer the htness seascape of gene expression (Box 1). This method discriminates direc¬ 
tional selection in a genuine htness seascape from purely stabilizing selection in a static htness 
landscape, providing a more powerful inference of adaptive evolution than Qst/Fst analy- 
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sis [29] (Materials and Methods). It also estimates the most important summary statistics of 
the adaptive process: the cumulative htness flux <h, which measures the htness gain through 
adaptive expression changes over an evolutionary period (Box 1, Materials and Methods). 

The fitness seascape of Drosophila gene expression 

We hrst use the aggregate divergence-diversity data to infer a gene-averaged htness seascape 
of expression levels in Drosophila (Fig. lA). The least-square htted seascape model (green 
line) contains stabilizing and directional selection, leading to adaptive evolution. This model 
explains the observed pattern r2(r): the short-term evolutionary constraint is caused by 
stabilizing selection, and the approximately linear long-term increase signals adaptation. 
Because genetic drift and adaptation differ in tempo, the contribution of adaptation to 
expression divergence depends on evolutionary time: the adaptive part is small for the 
youngest species clades, but adaptation becomes dominant across the entire Drosophila genus 
(green shaded area). In contrast, stabilizing selection alone cannot explain the Drosophila 
expression data. In a static htness landscape, genetic drift generates a rapidly saturating 
pattern of fI(T) that is not observed in the data (Fig. S2). 

We can extend the seascape inference to individual genes, using a Bayesian inference 
scheme that decouples from number and ehects of their expression QTL (Materials and 
Methods). We obtain a posterior distribution of stabilizing strength and htness hux for 
each gene. For 54% of all genes, we infer a signihcant cumulative htness hux <h across 
the Drosophila genus; we classify these genes as adaptively regulated (Table SI, Materials 
and Methods). Fig. IB shows the average posterior distribution of the htness hux, which 
determines a clade-specihc adaptive fraction of the expression divergence (Table 1, Materials 
and Methods). This fraction increases with clade divergence time, in accordance with the 
aggregate data of Fig. lA. Between D. melanogaster and D. simulans, which diverged about 
2-3 Myrs ago, 92% of the expression divergence can be attributed to genetic drift under 
stabilizing selection. Across the entire Drosophila genus, which has its last common ancestor 
about 40 Myrs ago, we infer 63% of the expression divergence to be adaptive. The Bayesian 
scheme also allows us to quantify the overall statistical signihcance of our selection inference. 
In Fig. 1C, we plot the cumulative log-likelihood score for all genes as a function of c and 
<h. As shown by a log-likelihood test, the global maximum-likelihood seascape model is 
strongly favored over the maximum-likelihood landscape model (P < 10“^®°°) and over 
neutral evolution (P < 10“^^°'^) (Materials and Methods). We note that this analysis rejects 
neutral evolution and evolution under static stabilizing selection independently of model 
assumptions on the adaptive dynamics. 

Testing alternative evolutionary scenarios 

The minimal seascape model explains the pattern of gene expression divergence across the 
Drosophila genus in a parsimonious way. But are there equally parsimonious alternative 
modes of selection or demography that are consistent with the data? To assess the speci- 
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ficity and robustness of the seascape-based inference, we characterize the statistics of gene 
expression levels in a number of alternative modes of evolution by analytical approximations 
and simulations, and we compare the results to the Drosophila data. 

First, demographic effects may increase or decrease the effective population size in a 
specihc lineage, which affects the stabilizing strength c for all genes. As shown in Fig. S3, 
lineage-specific changes in effective population size that persist over sufficiently long evo¬ 
lutionary periods can be traced in the aggregate divergence-diversity function r2(r). Such 
effects are not observed in our data, which suggests that long-term demographic effects do 
not play a dominant role in the evolution of Drosophila gene expression levels (Fig. S3). This 
result does not exclude short-term changes of population size, which occurred, for example, 
in the recent evolution of the D. melanogaster lineage [30]. Such changes can be traced in 
sequence polymorphism spectra [31-34], but they have only minor effects on gene expression 
levels. 

Next, we ask if the Drosophila data can be explained by lineage- and gene-specihc re¬ 
laxation of stabilizing selection. We consider a specihc non-adaptive mode of expression 
changes: functional genes evolve under stabilizing selection in a static htness landscape, but 
individual genes can (partially) lose function at a given point in their evolutionary history, 
which relaxes selection on their expression. We model loss of function as stochastic events 
occurring at a small rate, independently for each gene and on each lineage. This model 
produces a divergence-diversity function ff(r) with a long-term nonlinearity that is not seen 
in the hi data (Fig. S4). The most direct way to discriminate between relaxation of selection 
and adaptive evolution is to use a directional bias: most functional genes are up-regulated by 
stabilizing selection (a similar bias has been exploited in expression QTL studies [5,8,25]). 
In the loss-of-function mode, a comparison of expression levels for a given gene should show 
small cross-species differences at higher expression levels (i.e., between the lineages with a 
functional gene) together with large deviations at lower levels (i.e, in the lineages with lost 
gene function). Accordingly, the distribution of expression divergence values for a given 
species pair should show a broad tail generated by the loss events. These features are not 
observed in our data, indicating that relaxed stabilizing selection alone cannot explain the 
evolution of Drosophila expression levels (Fig. S4). Of course, loss of gene function does 
happen in our phylogeny, but the affected genes will often lose expression altogether and, 
hence, will be suppressed in our data set. 

We can also compare the Drosophila data with alternative models of adaptive evolution. 
For example, individual genes can undergo a (partial) neo-functionalization that requires 
a major change in their expression. We describe this mode of evolution by a punctuated 
htness seascape, in which large shifts of the peak position are stochastic events occurring at a 
small rate [26]. This process produces an aggregate divergence-diversity function ff(r) that is 
compatible with the data, but a broad tail in the distribution of expression divergence values 
that is not observed (Fig. S4). We conclude that gradual but continual changes in optimal 
levels, as described by our minimal model, are the dominant evolutionary force driving the 
adaptation of gene expression in Drosophila. 
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Functional and mechanistic determinants of selection 


By applying our inference to specific classes of genes, we can get a more detailed view on 
adaptation of gene expression in Drosophila. First, we observe a strong correlation between 
codon usage and adaptation: genes with specific codons show strongly reduced adaptive 
expression divergence and lower average fitness fiux than genes with broad codon usage 
(Fig. 2A,B, Table 1). Specific codon usage is known to be prevalent in highly expressed 
genes [35]; consistently, we find stronger conservation of expression and lower levels of fitness 
fiux in this class (Table 1). Different codons for the same amino acid differ in their efficiency of 
translation [35,36], which implies that genes with broad codon usage have a higher potential 
for adaptive expression changes at the post-transcriptional level. Here we find stronger 
adaptation at the mRNA level in this gene class, which suggests a two-tier mode of evolution: 
adaptive mRNA changes lay the ground on which coherent adaptive tuning of protein levels 
can build. 

At the same time, we find no significant correlation between the fitness fiux for expres¬ 
sion changes and adaptive evolution of amino acid sequence, as measured by a McDonald- 
Kreitman test [37] (Fig. S5). We conclude that gene expression and gene function provide two 
largely independent modes of evolution. For a metabolite or a transcription factor, adaptive 
changes of its cellular concentration are often coupled with conservation of its function. 

Our gene-specific inference can be used to detect functional gene classes associated with 
adaptive evolution of regulation. A full ranking of gene classes by enrichment in adaptively 
regulated genes with associated P values is reported in Table SI. Gene functions associated 
with enhanced adaptive evolution of expression include sensory perception, regulation, neural 
maturation, regulation of growth, aging and morphology. Adaptively regulated functions 
also include response to UV radiation, which has recently been identified as an important 
climate-mediated trait in humans [25,38]. Adaptive evolution of genes related to growth, 
regulation and morphology has been previously inferred by expression QTL and comparative 
studies of gene regulation in other species [5,6,21]. Here we identify these categories from 
a quantitative, system-wide scan for adaptively regulated genes. This points to the power 
of our phenotype-based inference scheme, which is not confounded by the combinatorial 
complexity of cis-regulatory sequence in higher eukaryotes. 

Sex-specific evolution of expression 

We can also test the role of expression differentiation between male and female individuals 
for adaptive evolution across the Drosophila genus. The sex specificity of a given gene [19], 
defined as the difference between its male and female expression level — Efi is a 

distinct quantitative trait whose evolutionary pattern can be analyzed by our method. We 
can distinguish two modes of evolution: conservation of sex specificity maintained by stabi¬ 
lizing selection and sex-specific adaptation of expression (Fig. 2C). Most genes of our data 
set have well-conserved and often small sex specificity; these genes evolve their expression 
levels coherently between males and females [19]. The remaining 19% of the genes have a 
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significant cumulative fitness flux of their specificity trait, ‘hmf; we classify them as undergo¬ 
ing sex-specihc adaptation of expression in the Drosophila genus. These genes cover all four 
chromosomes of the Drosophila genome. 

Gene functions associated with sex-specihc adaptation of expression include regulation 
of translation, reproduction, post-mating behavior and (immune) response to biotic stim¬ 
uli (Table S2). To understand the distribution of these adaptive processes between sexes, 
we apply our inference to classes of genes with different species-averaged sex bias of ex¬ 
pression [39]. For male-biased genes, the divergence-diversity ratio Gmf signals substantial 
sex-specihc adaptation (Fig. 2D). Consistently, the htness hux <Finf is strongly enhanced in 
genes that are predominantly expressed in males (Fig. 2E). The htness hux is lower in the 
other classes, including genes expressed predominantly in females. Together, we hnd a re¬ 
markable evolutionary asymmetry between sexes: male bias in expression is associated with 
adaptive evolution of expression (orange shaded areas in Fig. 2D,E); female bias in expression 
is under weaker directional selection, which suggests it primarily rehects conserved physio¬ 
logical diherences between male and female organisms. This result complements a previously 
observed evolutionary asymmetry at the sequence level: genes with male-biased expression 
show increased amino acid divergence [19]. As suggested by a McDonald-Kreitman test, this 
increase can be associated with adaptive evolution of gene function (Fig. S5). 

Discussion 

We have shown that adaptive regulation accounts for most of the macro-evolutionary di¬ 
vergence in gene expression across the Drosophila genus. Genes diher considerably in the 
amount of adaptation, depending on their codon usage, sexual diherentiation, and func¬ 
tional class. These results provide evidence for system-wide adaptation of gene regulation in 
Drosophila already at the primary level of transcription, notwithstanding further evolution¬ 
ary complexities at the level of translation [6,12]. It remains to be seen whether a similar 
prevalence of adaptation in the evolution of expression will be found in different species. 

Our inference of adaptation is based on the expression divergence-diversity ratio, which 
depends on the evolutionary distance between species. It exploits two fundamental evo¬ 
lutionary features of quantitative traits: at short evolutionary distance, the divergence is 
always near neutrality; at longer distance, it depends jointly on stabilizing and directional 
selection. These features generate a divergence pattern with two distinct molecular clocks, 
as shown in Fig. 1. Importantly, the phenotypic evolution of gene expression decouples from 
details of its genetic basis. This explains why we hnd strong selection on gene expression 
levels although selection on individual QTL is often weak [40]. 

Our method can be applied to a broad spectrum of molecular quantitative traits with a 
complex genetic basis, provided comparative data from multiple, sufficiently diverged species 
are available. Such traits include genome-wide protein levels, protein-DNA binding interac¬ 
tions or enzymatic activities. For most of these traits, we have only partial knowledge of the 
underlying genetic loci and their effects on trait and htness. Our method complements QTL 
studies and opens a way to infer quantitative phenotype-htness maps at the systems level. 
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Box 1: Inferring adaptive evolution of quantitative traits. 

The schematic shows the evolution of a quantitative trait in a single¬ 
peak fitness seascape. The distribution of trait values within a 
species (grey curves) changes over macro-evolutionary periods, which 
can be observed as cross-species divergence of the mean trait values 
(grey arrow). In the underlying fitness seascape (red curves), evo¬ 
lutionary displacements of the fitness peak lead to lineage-specific 
optimal trait values and directional selection (red arrow). The min¬ 
imal fitness model has two parameters: the stabilizing strength c 
is proportional to the inverse square width of the fitness peak, the 
driving rate v measures the mean square displacement of the fitness 
peak per unit of evolutionary time (see Materials and Methods for 
precise definitions). We infer selection on quantitative traits from 
their time-resolved inter-species divergence, D{t), and their intra¬ 
population diversity. A, as defined in Materials and Methods. We 
evaluate the ratio n(r) = ttoD{t)/A, scaled by the neutral sequence 
diversity ttq and averaged over a family of traits for three or more 
species with different divergence times r (Fig. lA, squares). The 
test [26] is guided by theoretical results on the evolution of quanti¬ 
tative traits [26,28]. These provide the analytical form of 0,{t) in a 
fitness seascape (c > 0, u > 0; green solid line). The corresponding 
form neq(7') in a fitness landscape of the same stabilizing strength 
(c > 0, u = 0; blue solid line) reaches a saturation value flstab- 
Fitting the seascape model to the Q, data determines the decompo¬ 
sition fl(r) = neq('r) + flad('7‘) (blue and green shaded areas). The 
amplitude ratio Wad('r) = fIad('r)/f^(T) gives the fraction of trait di¬ 
vergence that is adaptive, i.e., driven by directional selection. In the 
linear regime II(r) « Ustab + ^ad('^)) which covers all species clades 
in this dataset, the fitted amplitudes provide simple estimates of 
the selection parameters, c « l/(2nstab) and v ^ 2IIad('r)/T, and 
of the resulting cumulative fitness flux 2N^ = 2cvt (scaled by the 
effective population size N). The divergence-diversity ratio for neu¬ 
tral evolution, no(T), is shown for reference (c = 0; grey solid line). 
A Bayesian extension of the Q test and its relationship to other 
trait- and sequence-based selection tests {Qst/Fst test [29] , Ornstein- 
Uhlenbeck models, [20,43-45] and McDonald-Kreitman test [37]) are 
discussed in Materials and Methods. 
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Figure 1: Adaptive evolution of gene expression. (A) The aggregate divergence-diversity 
ratio, from all genes of this data set is plotted against the divergence time, r for six partial species 
clades (■) and for the entire Drosophila genus (■). Species clades and divergence times (scaled by 
the rate of synonymous mutations) are defined by the phylogenetic tree of the Drosophila Reference 
Panel (Fig. SI A and Materials and Methods). These data are shown together with theoretical curves 
f2(r) under directional selection (green line), under stabilizing selection (blue line), and for neutral 
evolution (grey line). Inferred model parameters are c* = 18.6 and v* = 0.07; see Box 1 and 
Materials and Methods for model details. We infer a time-dependent adaptive component of the 
expression divergence (green shaded area); the complementary component is generated by genetic 
drift under stabilizing selection (blue shaded area). Adaptation accounts for 63% of the expression 
divergence across the Drosophila genus. See Fig. S2 for a comparison of the same data to models of 
time-independent stabilizing selection [20]. (B) Distribution of maximum-likelihood values of the 
scaled cumulative fitness flux, 2N^, inferred for individual genes (Materials and Methods). Our 
inference classifies 54% of all genes as adaptively regulated (2A<h > 4, green shaded part of the 
distribution). (C) Bayesian inference of fitness models. The posterior log-likelihood score S'(c, <I>) 
favors the optimal seascape model (c* = 18.6, 2A<I>* = 3.6; green square) over the best landscape 
model (Cgq = 16, = 0; blue square) and the neutral model (c = 0, 4> = 0; grey square). 
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Figure 2: Adaptation depends on codon usage and sex bias. (A) The aggregate divergence- 
diversity ratio, O, for genes with broad codon usage (A) and for genes with specihc codon usage 
(v) is shown together with theoretical curves under directional selection (dashed and dashed-dotted 
lines); the theoretical curve inferred for all genes is shown for comparison (solid line, cf. Fig. lA). 
Codon usage is measured by the effective number of codons [46] (Materials and Methods); inferred 
model parameters are listed in Table 1. (B) The distribution of the cumulative htness flux, 2A<1>, 
is plotted against the effective number of codons [46], n (Materials and Methods; circle: average, 
line: median, box: 50% around median, bars: 70% around median). (C) Conservation of sex 
specihcity (top panel) vs. sex-specific adaptation of expression (bottom panel). The sex-specificity 
trait (brown line) is dehned as the difference between male and female expression levels (purple 
and blue lines). The schematics show all three lines as functions of evolutionary time. (D) The 
aggregate divergence-diversity ratio of the sex-specificity trait, fimf) for all genes (□), genes with 
male-biased expression (A), and genes with female-biased expression (v); is shown together with 
theoretical curves under directional selection. (E) The distribution of the cumulative htness hux 
for the sex-specihcity trait, 2A<f>inf, is plotted against the species-averaged sex specihcity, 
(Materials and Methods). Sex-specihc adaptation (2A<hnif > 4.5, orange shaded part) occurs 
predominantly in male-biased genes. 
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Materials and Methods 

Summary 

Synonymous genome sequence is used to estimate the neutral sequence diversity ttq = (ref. [41], 
fj, is the point mutation rate and N the effective population size) and the species divergence times 
Tij (ref. [42], scaled in units of these data underlie the phylogenetic tree [42] (Fig. SI). The 
expression levels ^ measured by Zhang et al. [19] are labelled by gene a, species i, sex s, and 
replicates k of an inbred line. The levels are normalized to mean 0 and cross-gene variance 1 for each 
individual; the effects of this normalization on our analysis is tested in Fig. S6. For a given gene 
a, we estimate isogenic variance between experimental replicates of an inbred line and genetic 
mean F" in each species, and the divergence Dfj = (F" — F")^ between any two species i,j-, we 
also estimate genetic variance (expression diversity) A“ using data from the two distinct strains of 
D. simulans. We dehne the aggregate divergence-diversity ratio fl"- = 7rQ{Dij)/(A), where angular 
brackets denote averages over genes. For each clade C in our phylogeny, we obtain the aggregate 
data (rc, flc) shown in Figs. lA, 2A by averaging Tij and fljj over all pairs of species i,j € C that 
are connected via the root of the clade. 

The minimal fitness seascape for a given gene takes the form 

where the optimal trait value E*{t) performs an Ornstein-Uhlenbeck process with mean square 
displacement vEq per unit of evolutionary time; E^ is the average genetic variation of expression 
in the long-term limit of neutral evolution [27]. This non-equilibrium model generates adaptive 
evolution with an average scaled cumulative fitness flux (2iV<h) = 2cuTDros. across the Drosophila 
genus (rDros. = 1-4; Fig. SI). Applying the test (Box 1) to aggregate expression data {tc,Dc), 
we infer a global fitness seascape with parameters (c* = 18.6, u* = 0.07) and an average fitness 
flux 2N^* = 3.6 per gene. Control hts of the same data to equilibrium models, including the 
well-known Ornstein-Uhlenbeck dynamics for the population mean trait [20,43-45], are shown 
in Fig. S2. The probabilistic extension of this test evaluates the Bayesian posterior probability 
distribution (5(c, 4>|E“) of individual genes, given their sample mean data E“ = (E",...,E“). 
This produces gene-specific expectation values c“ and (Figs. IB and 2B); we use the condition 
2A4>" > 4 to infer adaptively regulated genes (Table SI). The cumulative log-likelihood score 
'5(c,^) = EalogQ(c,4>|E“ ) serves to quantify the statistical significance of our inference (Fig. 1C). 

To test for lineage-specific demographic effects, we compare the aggregate U data to theoretical 
functions U(r, Ti) computed for an alternative model with a change in effective population size on 
the phylogenetic branch of species i (Fig. S3). We also examine two alternative selection scenarios: 
relaxed stabilizing selection by partial loss of function (c“ switches to a reduced value with rate 7) 
and punctuated fitness peak shifts (E*“ jumps by an amount of order Eq with a rate of order vp) 
(Fig. S4). The observed distributions of cross-species expression differences are consistent with the 
minimal seascape model but at variance with both alternative models (Fig. S4). 

To infer sex-specific evolution, we define specificity traits as differences between male and female 
expression levels, — E'^-, for each gene [19]. Genes with sex-specific adaptive evolution 

of expression are identified by a condition on the cumulative fitness flux for the specihcity trait, 
2E<h"£ > 4.5 (Table S2). Genes with male- and female-biased expression are identified using the 
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results of ref. [39]. 

We simulate Fisher-Wright evolution in fitness land- and seascapes to validate our probabilistic 
inference scheme and to establish its robustness under trait epistasis (Fig. S7). 


1. Data and primary analysis 

Seqnence data and phylogeny. Our inference procedure requires the following global 
sequence-based information (which does not include expression QTL); 


(a) A phylogenetic tree of the 7 Drosophila species included in this study. Here we use the 
tree of the Drosophila 12 Genome Consortium [42], which is based on genome-wide diver¬ 
gence at synonymous sequence sites. This tree determines six clades of phylogenetically 
related species (Fig. SlA), which are used in our analysis of time-dependent expression 
divergence (Figs. lA and 2A,B). 

(b) Divergence times between all pairs of species, scaled in units of the inverse neutral point 
mutation rate. The tree of Fig. SlA uses a lineage-specihc mutation rate to infer the 
length of its 12 branches. The scaled divergence time Tij for a given species pair {i,j) is 
the sum of the lengths of the branches connecting these species. The scaled divergence 
time of a clade C is dehned as an average over species pairs. 


TC 


1 

|Ci||C\Ci 


Hi, 

iGCi jGC\Ci 


( 1 ) 


where C is the set of species in the clade and (Ci, C 2 ) is the portioning of this set dehned 
by the root node. These divergence times differ substantially from previous estimates 
based on amino acid distances [20]. 

(c) The neutral nucleotide sequence diversity. 


TTo = 2pp,N, (2) 

where N is the effective population size, and p = 1,2 for haploid/diploid organisms, 
respectively. Here we use ttq = 0.0112, as determined in ref. [41] from genome-wide 
polymorphism data at synonymous sequence sites. The sequence diversity enters the 
dehnition of the scaled D ratio in equation 9 and the probabilistic extension of the D 
test (section 2). 

Expression data. We use genome-wide expression data from 7 Drosophila species ob¬ 
tained by Zhang et ah [19] (Gene Expression Omnibus under accession number GSE6640). 
These data are well suited for our analysis. They cover several clades of species that are well 
comparable at the organismic level and sufficiently diverged for adaptive evolution of expres¬ 
sion to be detectable (section 2). Moreover, Drosophila has larger effective population size. 
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higher mutation rates, and shorter generation times than typical mammalian species [47], and 
adaptive evolution has been detected at the genomic level by several methods [1-3]. Hence, 
compared to more recent data from other species [24,48,49], the Drosophila expression data 
of Zhang et ah [19] are a suitable target for the inference of adaptive evolution. These data 
contain mRNA intensity measurements for a number of male and female inbred replicates 
in each species. Specihc microarray platforms were designed for each of these species. Each 
platform has an array of probes mapped to assembled genome sequences and to GLEANR 
gene annotations by the Drosophila 12 Genomes Gonsortium [42], which also provides se¬ 
quence homology tables. We restrict the analysis to the 6332 genes that have unambiguous 
one-to-one orthologs across all lines and are tested by at least four probes in each microarray 
platform. We obtain a set of expression levels ^ (dehned as log 2 intensities) labelled by 
gene number a G {1,... , 5 f = 6332}, species i G {mel, sim, yak, ana, pse, vir, moj} (Fig. SI), 
sex s G {m, / }, and inbred replicates k G {1,..., ki^s = 4 — 7}. The data contain two strains 
of D. simulans (14021-0251.011 and 14021-0251.198), which are used to estimate the genetic 
variance of expression (see below). 


Normalization of expression levels. We dehne a linear transformation of the levels [50], 

- {Ei,S,K) 






( 3 ) 


where k) and denote mean and variance of the expression across all genes in a 
given individual {i, s, k). The transformed levels ^ are shifted to mean 0 and normalized 
to variance 1 across all genes in each individual. The transformation (3) is a heuristic to 
reduce differences in probe sensitivity between microarrays (each individual is measured 
in a separate array). To test its influence on our inference of selection, we compare the 
aggregate D ratio, which is dehned in equation (9), for untransformed expression levels, 
expression levels with only shift (E"^^ —)■ — (Ej^^ ^,)), and expression levels with shift 

and normalization [50] (Fig. S6A-G). Shifting to zero mean turns out to be an essential 
step to remove spurious expression divergence. The subsequent normalization to variance 
1 affects the D data and our inference of selection only weakly. However, additively and 
multiplicatively transformed expression levels produce less noisy D data than levels with 
only additive transformation (cf. section 2). Hence, we use levels ^ as given by the 
transformation (3) for our evolutionary analysis. 


Expression statistics within and across populations. Using the normalized expres¬ 
sion levels, we can dehne averages and natural variation of expression at three diherent 
levels: 

(a) The mean and variance of expression across experimental replicates of an inbred line 
characterize the distribution of expression levels for a given genotype. Here we estimate 
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these quantities from the data of each inbred line, 


pa _ _ \ ^ pa 




- f 

t,S,K l^SJ 5 


and we dehne the sample mean and variance, 


Et = \{Elm + E^j), s: = + S^j). 


(4) 

(5) 


(b) The genetic mean and diversity of expression characterize the distribution of heritable 
expression differences in a population. Heritable components of quantitative traits are 
often inferred from “common garden” breeding experiments under standardized envi¬ 
ronmental conditions. Here we estimate the genetic mean and diversity for a given gene 
from the data within one species. 


r« = i?“± 







a 

siml 


')2 

^sira2) 


1 

—5“ 

h 


( 6 ) 


where we have included the expected sampling error for T". The data set of ref. [19] limits 
the inference of expression diversity to a broad estimate from two strains of D. simulans. 
This is sufficient for our analysis, because the inference of adaptation decouples from the 
precise value of (A) (section 3). Similarly, we dehne the expression dimorphism between 
males and females in each species. 


A 






e: 


ij) 


A:/* 


(7) 


(c) The expression divergence is dehned as the squared difference between population means, 
Dfj = (T" — T’jY, and characterizes evolutionary expression differences between two 
species. Here we estimate the divergence for a given gene from the cross-species data. 


= {E- - E^f - A“ - -5t - -5 


ki 


kj 


( 8 ) 


Equations (6) and (8) follow Wright’s decomposition of the variance of a quantitative trait 
into intra- and inter-species components [59], which underlies the quantitative genetics sum¬ 
mary statistics Fst and Qst (see section 2). For the analysis of sex-specihc evolution (sec¬ 
tion 3), we use the same rationale for the sex-specihcity traits — Efj. 

In Fig. SIB, we compare gene-averaged values of isogenic variance, diversity, dimorphism 
and divergence (these averages are denoted by angular brackets), as well as the cross-gene 
variance of expression. We hnd a clear ranking {5i) < (A) < (Ai^rnf) < (Eij) < Vi for all 
species i and j, where Vi = (Tj) ^ 1 by our normalization. In the H test for selection on 
gene expression, we use diversity and divergence estimates given by equations (6) and (8) 
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in aggregate measures across groups of species and classes of genes. However, our data set 
has a low number of individuals per species. Hence, single-gene estimates of diversity and 
divergence are noisy, which calls for a probabilistic inference of selection. The H test and its 
probabilistic extension for individual genes are described in section 2. 

Divergence-diversity ratio, hi. The aggregate expression divergence-diversity ratio 
for a given species pair (i, j) is dehned as 

a.i = (9) 

where ttq is the neutral sequence diversity (equation 2). The expression diversity and gene- 
specihc divergence values and A"- are given by equations (6) and (8). Angular brackets 
denote averages over all genes in our dataset, (Dij) = - prefactor in equation 

(9) is chosen such that H ~ 1 for neutral evolution in the limit of long divergence times 
(section 2). The divergence-diversity ratio He for a species clade C is dehned as an average 
over species pairs. 

He = ^ , V V Hi,-, (10) 

in analogy with the dehnition (1) of clade divergence times. We also dehne divergence- 
diversity ratios Vtf- and H^ for specihe gene classes using restricted averages ■ ■)g. 

2. Inference of selection on gene expression 

Evolutionary model. We consider the evolution of gene expression levels under genetic 
drift, mutation, and selection given by a htness model with peak displacements on macro¬ 
evolutionary time scales. In the minimal seascape model [26,28], the htness of a given gene 
depends on its expression level E and on evolutionary time f, 

f{E,t)=r-co{E-E*{t)Y. ( 11 ) 

The expression value of maximum htness, E*{t), performs an Ornstein-Uhlenbeck random 
walk with dihusion constant vq, average value S and stationary mean square deviation t'^Eq, 
where is a constant of order 1. This process is dehned by the Langevin equation 

where r]{t) is the random variable of a delta-correlated Gaussian process with average 0 
and variance vq. These random variables are assumed to be independent for each gene and 
on each lineage. The Ornstein-Uhlenbeck htness seascape should not be confused with a 
previous Ornstein-Uhlenbeck model for the evolution of quantitative traits under stabilizing 
selection [20,43-45,51-53] (a detailed comparison is given below). 
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The minimal seascape model captures two kinds of selection on gene expression in a 
unified way: 


(a) Stabilizing selection. This type of selection constrains the intra- and inter-population 
variation of expression levels to values around E*{t). We dehne the dimensionless stabi¬ 
lizing strength 

c = 2NElco, (13) 

where N is the effective population size and the trait scale Eq is given by the average 
genetic variation of expression in the long-term limit of neutral evolution [27], Eq = 
\\m.T-^oo{D{T))/2. In the limit case Vq = 0, the htness seascape reduces to a static 
htness landscape, f{E) = f* — Cq{E — E*Y, and stabilizing selection is the only selective 
force. This provides a simple interpretation of the selection parameter c: it compares the 
(hypothetical) genetic load cq Eq of a neutrally evolving trait evaluated in the landscape 
f{E) and the actual genetic load l/2iV in the same landscape, assuming a mutation- 
selection-drift equilibrium at low mutation rates [28]. This parameter signals the regimes 
of weak (c < 1) and strong (c > 1) stabilizing selection [27]. 


(b) 


Directional selection. In a fitness seascape, this type of selection triggers adaptive re¬ 
sponse of the population mean trait in the direction of htness peak displacements. We 
dehne the scaled driving rate 



(14) 


This parameter measures mean square displacement of the htness peak, in units of E^ 
and per unit l//i of evolutionary time. In maero-evolutionary seascapes, v is sufficiently 
low for population to follow htness peak displacements; such seascapes are a joint model 
of stabilizing and directional selection [26]. The values of v inferred from our data fall 
in this regime (see section 2). Because the seascape dynamics is a short-range Markov 
process, the mean square peak displacement over a scaled evolutionary time r is then 
simply Eq vt. (Here we express v in units of /i and r in units of l//i, which dihers 
slightly from the notation in refs. [26,28].) In the long-term regime vr 3> r^, the htness 
peak dynamics becomes stationary with mean £ and variance r^i7g. This regime turns 
out to be well beyond the divergence times in our species sample. Hence, the statistics 
of Drosophila gene expression levels and our inference of selection are independent of r^. 


Fitness flux. This measure of adaptation is dehned as the speed of movement on a htness 
land- or seascape by genotype or heritable phenotype changes in a population [26,54]. The 
cumulative htness hux associated with the population mean expression level r(f) of a gene 
in a htness seascape f{E,t) is given by 


$(r) 



df{r,t) dr(t) 
(9r dt 


dt. 


(15) 
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This quantity measures the total amount of adaptation over a macro-evolutionary period r in 
a population history. This quantity satishes the htness flux theorem [54], which generalizes 
the Fisher’s fundamental theorem of natural selection to mutation-selection-drift processes. 
As shown by the htness hux theorem, the average cumulative htness hux over parallel evo¬ 
lutionary histories, in units of l/2iV, measures the importance of adaptation compared to 
genetic drift: adaptation is substantial if (2iV<h(r)) > 1. For a stationary adaptive process 
in the minimal seascape (11), the average scaled cumulative htness hux takes the simple 
form [26,28] 

{2iV<F(r)) ~ 2cur, (16) 

up to factors of order tto, as quoted in Box 1. The exact functional form of the htness hux 
is given in reference [26]. This quantity is closely related to the time-dependent fraction of 
expression divergence that is adaptive, uJs^{t) (equation 24). We introduce the shorthand 
$ = <h(rDros.) with TDros. = 1-4 (Fig. SlA); this quantity measures the amount of adaptation 
across the Drosophila genus. By the probabilistic inference method discussed below, we 
obtain expectation values 2iV<F“ of the rescaled htness hux for individual genes over the 
divergence time of the Drosophila genus (equation 33). We use these values to describe 
the overall statistics of expression adaptation (Fig. IB), to infer diherences in adaptation 
between gene classes (Fig. 2B,D; Table 1), and to dehne signihcantly adaptive genes (using 
a threshold 2iV<F“ > 4; Table SI). For the analysis of sex-specihc adaptation (Fig. 2C,D), 
we dehne an analogous htness hux 2iV<Fmf for sex-specihcity traits (section 4). 


Evolutionary modes of quantitative traits. In the minimal seascape model, the aggre¬ 
gate ratio dehned by equation (9) depends on the divergence time r and on the selection 
parameters c and v. We can use this dependence to distinguish three modes of evolu¬ 
tion [26,28]: 


(a) 


Neutral evolution (c = 0). The divergence-diversity ratio has an initially linear increase 
due to mutations and genetic drift, and it approaches a maximum value 1 with a scaled 
relaxation time of 1, 


^o(t) ^ 


T for r 1, 
1 for r 3> 1. 


( 17 ) 


The short-term behavior rehects the linear growth of the average divergence [22,55,56], 


(D(r))o ~—'^—T for r 1. (18) 

'^0 

The growth rate is the average diversity at neutrality divided by the sequence diversity 
(equation 2). This ratio equals the mutational variance of a quantitative trait as dehned 
in refs. [22,55,56], up to a rescaling of evolutionary time to units of l/p. 

(b) Evolution under stabilizing selection (c > l,u = 0). In a static htness landscape, the 
divergence-diversity ratio approaches a smaller maximum value, flstab(c) < 1, with a 
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proportionally shorter relaxation time [26], 


fieq(r) ~ 


r for r < nstab(c) 

fistab(c) for r > 12stab(c). 


(19) 


Over a wide range of evolutionary parameters, which includes the inferred values for the 
data set of this study, the maximum value depends on the stabilizing strength in a simple 
way, Ostab(c) l/(2c), with corrections for weaker selection and for larger nucleotide 
sequence diversity [27]. 

(c) Adaptive evolution under stabilizing and directional selection (c > l,v > 0). In a 
genuine htness seascape, the divergence-diversity ratio acquires an adaptive component. 


0 (r) = Oeq(r) Oad(v) 


r for r < Ostab(c) 

^stab(c) + \v[t - 20stab(c)], for T > Ostab(c), 


( 20 ) 


with corrections for r approaching the saturation time of htness peak displacements, 
jv. The universal short-term behavior fl(r) ~ r (equations 19 and 20) rehects the 
quasi-neutral growth of the divergence [26,28], 

(Il(r)) ~ T for r < nstab(c), (21) 

TTo 

where (A)(c) is the average diversity under selection. As shown by comparison with the 
neutral behavior (equation 18), selection enters only via the constraint on (A). Over a 
wide range of the stabilizing strength c, this constraint remains weak and Dir') evolves 
near neutrality [27], as long as r Ostab(c). 


The full analytical form of the functions Oo('r) (equation 17), Oeq('r) (equation 19), and 0(t) 
(equation 20) is given in refs. [26,28]. 


The O test for selection. The evolutionary statistics of the O ratio provides a joint test 
for stabilizing and directional selection on quantitative traits. We can infer the selection 
parameters of a seascape model by htting the function fl(r) (equation 20) to data (r, f2). 
This method has the following properties: 


(a) The test requires data (r, f2) from species with different divergence times in the regime 
T ^ f^stab- In the quasi-neutral regime r < fistab, the ratio is insensitive to selection 
(equations 17-20). 

(b) By the decomposition (equation 20), the test infers a time-dependent fraction 


t^ad('r) 


^ad("^ ) 

fl(r) 


( 22 ) 
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of the aggregate trait divergence to be adaptive. The complementary fraction, l—cUad ('?’), 
is attributed to genetic drift under stabilizing selection. 

(c) We can approximate the divergence-diversity ratio (equation 20) by the linear form 
r2(r) ^ ffstab + f^ad('r) = f^stab + VT/‘I. Therefore, already a linear £t to data produces 
simple estimates of stabilizing strength and driving rate. 


c 



V ~ 


2nad('r) 

r 


(23) 


and infers the adaptive fraction of expression divergence, which is related to the average 
scaled fitness flux [26] (equation 16), 


t>.^ad(T) 


flstab 

’ 


(2iV$(r)) 


2a;ad(r) 2(fi(r) ^stab) 


1 - UJadir) 


a 


(24) 


stab 


(d) The n ratio of a quantitative trait depends on the selection parameters c and v, but it 
decouples from the genetic basis of the trait. Specifically, it depends only weakly on the 
number and effect size of the underlying QTL [26,27], on the amount of recombination 
between these sites [26,27], and on the nonlinearities in the genotype-phenotype map 
(trait epistasis; see section 5 and Fig. S7B). The hi statistics also decouples from details 
of the selection dynamics; it can be applied to punctuated adaptive processes, which have 
fewer and larger peak displacements [26] (section 3). 


Application of the hi test to gene expression data. In Fig. 1, we compare the model 
function f2(r) (equation 20) to aggregate expression data (Tc,f2c) for six Drosophila species 
clades (equations 1 and 10). The best-fit seascape model (c* = 18.6, v* = 0.07; green line) 
explains these data, which produces evidence for adaptive evolution of gene expression. Using 
the decomposition into adaptive and drift components (green and blue shaded areas), we 
obtain a cumulative fitness flux (2iV<l>) = 3.6 across the entire Drosophila genus (equations 23 
and 24). Importantly, the inference of adaptive evolution decouples from the precise overall 
scale of n, which is influenced by our limited information on expression diversity (section 2). 


Control analysis of eqnilibrium models. We can also compare the aggregate expression 
data (tc, flc) fo models of time-independent stabilizing selection: 

(a) Fitness landscape model. In contrast to the seascape model, the best-fit landscape model 
(ceq = 13, t; = 0) provides a poor fit to the data (Fig. S2A). It captures the average 
ratio across the Drosophila clades, but fails to describe the systematic amplitude differ¬ 
ences between these clades. In particular, the landscape model drastically overestimates 
the divergence of close species, Dmei-yak and Dmeisim- 

(b) Ornstein-Uhlenbeck model. In a previous study, Bedford and Hartl [20] analyze the same 
data set and infer broad stabilizing selection on expression levels, which is consistent 
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with our results. However, they observe a saturation of gene expression divergence 
that is at variance with the inference of a linear growth on time scales beyond the 
divergence time of D. melanogaster and D. simulans (Fig. lA and ref. [19]). This can 
be traced to differences in data analysis. First, Bedford and Hartl [20] use amino acid 
distances in their phylogeny. These distances are affected by selection [57] and produce 
relative branch lengths that differ substantially from the phylogeny based on fourfold 
synonymous sites [42] used in this study (Fig. SlA and Fig. S2). Second, Bedford 
and Hartl [20] analyze expression divergence for pairs of species, while we group the 
species into clades (Fig. SlA). These differences lead to a more noisy dependence of the 
expression divergence data on evolutionary time and make the distinction of conservation 
and adaptation more difficult (Fig. S2B). Bedford and Hartl [20] £t these data to an 
Ornstein-Uhlenbeck model of evolution under stabilizing selection [43] (equation 28), 
which is described below. This model has two independent parameters, which equals 
the number of fit parameters for our minimal seascape model. Similarly to our landscape 
model, the best-fit Ornstein-Uhlenbeck model explains the average expression divergence 
across the Drosophila genus (Fig. S2B), but it cannot explain the pattern of expression 
divergence between close species. The model predicts a quasi-neutral linear growth of 
the divergence with Dmei-yak/Dmei-sim ~ Tmei-yak/Tmei-sim ~ 2 (equation 21), which 
drastically overestimates the observed ratio Dmei-yak/Dmei-sim ~ 1-2. 


Comparison of the H test with related methods. Our inference method for selection 
on quantitative traits can be compared with three well-known selection tests for phenotypic 
and genomic data: 


(a) Qst/Fst ratio test for selection on quantitative traits. The summary statistics Fst and 
Qst measure the expected fraction of the total genetic variation harboured in a pair 
of populations that can be attributed to the divergence between these populations; the 
complementary fraction is attributed to the diversity within populations. Fst refers to 
neutrally evolving sequence loci [58-60], which can be regarded as a “pseudotrait” with 
aggregate divergence and diversity. Qst is the analogous measure for quantitative traits 
under selection [61]. The expected dependence of these measures on divergence time can 
be expressed in terms of the H ratio (equation 9), 


(F>(r))o ^ flo(r) 

(Il(r))o-|-2(A)o r2o('r) + 27ro’ 

(-P(^)) ^ ^(^) 

(Z1(t)) -\- 2(A) fl('?’) T 27ro 


(25) 

(26) 


where we use expectation values (...) in an ensemble of parallel-evolving populations 
and the subscript 0 refers to neutral evolution. The Qst/Fst test [29] stipulates that 
a quantitative trait is evolving at neutrality if Qst/Fst = 1 , under stabilizing selection 
if Qst/Fst < 1, and under directional selection if Qst/Fst > 1- Comparison with the 
theory of the H ratio (equations 17-21) shows that the Qst/Fst test is insensitive to 
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selection for divergence times in the quasineutral regime, 

for r < fistab(c). (27) 

rsT[T) 

The data set of this study, which has divergence times r > hlgtab and aggregate values 
Qst/^st between 0.6 for the mel-sim clade and 0.8 across the entire Drosophila genus; 
these values are obtained using equations (10), (25), and (26). Hence, this test signals 
broad stabilizing selection but no directional selection. In contrast, the H test infers 
both stabilizing and directional selection from the linear dependence H(r) (Fig. lA and 
equation 20). This inference shows a conceptually important point; stabilizing and 
directional selection are not mutually exclusive, but joint features of selection on macro¬ 
evolutionary time scales. 

(b) Ornstein-Uhlenbeck model for quantitative trait evolution. This phenomenological model 
describes a quantitative trait evolving under genetic drift and stabilizing selection [43, 
51-53] and has been applied to the evolution of gene expression [20,44,45] (a detailed 
comparison with the results of ref. [20] is given above). The model is dehned by a 
Langevin equation for the population mean trait, 

4r(() = -A (r - E’) + r,r((). ( 28 ) 

where prit) is the random variable of a delta-correlated Gaussian process with average 
0 and variance The model constants A and are usually regarded as independent 
£t parameters. The Ornstein-Uhlenbeck dynamics of the population mean trait r{t) 
around a fixed optimal trait value E* (equation 28) should not be confused with the 
Ornstein-Uhlenbeck dynamics of the time-dependent optimum E*{t) in our seascape 
model (equation 11). A Langevin equation similar to (28) can be derived from a more 
general population-genetic model for the evolution of a quantitative trait E a static 
fitness landscape f{E) = —cq {E — E*)"^, which has been introduced in ref. [27,62]. In 
this model, the population mean trait follows the Ornstein-Uhlenbeck process 

jTit) = -2(A) Co (T - E*) - 2/r (T - To) + Vr{t), (29) 

where Tq is the genetic mean trait in the long-term limit of neutral evolution and prit) is 
the random variable of a delta-correlated Gaussian process with average 0 and variance 
(A) /N. Gomparison with equation (28) determines the Ornstein-Uhlenbeck coefficients 
in terms of the stabilizing strength and the average trait diversity (A = 2(A) cq, = 
{A)/N). Equation (29) contains an additional mutational term (— 2/i)(r — Tq), which 
implies that the expectation value (T) differs from the optimum trait value E*. We note 
that the diffusion constant {A)/N determines the behavior of the O ratio (equation 19), 
of the trait divergence (equation 21), and of the Qst/Fst ratio (equation 27) in the 
quasineutral regime (r Ogtab)- The Ornstein-Uhlenbeck model has been generalized to 
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account for lineage-specific stabilizing selection in a phylogeny [43-45,51-53]; however, 
inferring independent selection parameters for each lineage may lead to overhtting of 
our data set. Instead, we use the seascape model (11) to infer lineage- and gene-specihc 
changes of the trait optimum E*{t) using a single additional selection parameter v. 

(c) McDonald-Kreitman test for adaptive sequence evolution [37]. The conceptually closest 
sequence-based test evaluates the sequence divergence-diversity ratio hi for a sequence 
class under putative selection (e.g., non-synonymous mutations in protein-coding se¬ 
quence) and compares it to the analogous ratio flo for bona fide neutral changes (e.g., 
synonymous mutations). Positive selection in the query sequence is inferred if hi > flo- 
In this case, the amplitude ratio ctseq = (fl — estimates the fraction of non- 

synonymous substitutions that are adaptive, i.e., driven by positive selection [57]. This 
is a variant of the McDonald-Kreitman test [37]. It requires data from query sequence 
and from neutral sequence, but only from a single pair of species with a divergence time 
beyond the coalescence time. In contrast, the D test requires only data from traits under 
selection, but from three or more species with divergence times beyond the equilibrium 
relaxation time Dstab- These differences highlight distinct evolutionary characteristics of 
quantitative traits. First, such traits have a quasi-neutral regime of macro-evolutionary 
divergence times (equation 21) that has no direct analogue in sequence evolution [27]. 
Second, in most cases we do not have a gauge of neutrally evolving traits analogous to 
synonymous sequence. 

Probabilistic inference of selection. Here we describe the extension of our selection 
inference method to expression data of individual genes. A minimal seascape model is 
determined by the parameters (c, n) or equivalently by (c, <h), where <F = 2cnrDros./2A^ 
denotes the expected cumulative htness flux over the genus divergence time (equation 16). 
We derive a posterior probability distribution Q{c, <F | E“), where E = (E“,..., E^) denotes 
the expression levels of gene a in the 7 species of our data set. This derivation consists 
of three steps: we obtain the probability distribution Q{r \c, <F) of population mean traits 
P" = (T",..., T") in a given seascape model, we include sampling effects to determine 
the distribution (5(E|<I>), and we use Bayes’ theorem to infer the posterior distribution 
Q(c,4>|E“). 

The basic building block of evolutionary statistics in the minimal seascape model has been 
derived previously [26]: the lineage propagator Gr(r, E*\Ta, E*) is the probability density of 
mean and optimal trait values (T, E*), given the values (T^, E*) in an ancestral population at 
scaled evolntionary distance r. The lineage propagator is related to the stationary distribu¬ 
tion of the seascape dynamics, (5stat(r,E*) = limT-^oo GT-(r, E^ITq, E*). Both distributions 
are Gaussian functions that depend on the seascape model parameters and on the neutral 
variance (trait scale) Eq; their detailed analytical form is given in equations (30)-(33) and 
(A.15)-(A.20) of ref. [26]. The probability distribution of popnlation mean traits across the 
Drosophila genus is the stationary distribution for its last common ancestor mnltiplied by 
the lineage propagators for all branches of the phylogeny; this expression is to be integrated 
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over all unknown expression levels. Specifically, we obtain 


i-i 


Q(r“ I c, e*) J] e* \ ... drf dEi... dEi, 


2=1 


(30) 

where i = 1,... ,k labels the extant species and i = k + I,... ,l the clade ancestor species 
(with I = 2k —1 = 13 and the index I referring to the last common ancestor of all species), a{i) 
denotes the closest ancestor of species i, and r(i) is the scaled length of the branch between i 
and a{i). The deviations of the expression measurements from the population mean trait 
T" can be described by a Gaussian sampling error model with variance (A"/2) + [S'^/ki), as 
given by equation (6). We obtain 


g(E“|c,«i>,E2) 


1 

z 


g(r“ I c, $) exp 


1 (Ef - 
2 (A“/2) + (5f/A;,)_ 


d^^.. dr^. 


(31) 


where Z is a normalization factor. This multi-variate Gaussian integral can be evalnated in 
a straightforward way by the saddle point method. Here we approximate the noisy diversity 
valnes of individnal genes by the species averages Aj and 5*. Finally, Bayes’ theorem gives 
the posterior distribution 


g(c,$|E“) 


/g(E»|c,$)Po(c,$)dEg 
/ g(E“|c,$)Po(c, d>) dP^dcd$’ 


(32) 


where Po(c, <h) denotes the prior distribntion of seascape parameters. This distribntion 
determines the maximum likelihood posterior values of stabilizing strength, htness flux, and 
adaptive fraction of expression divergence. 


(c“, <F") = argmaxg(c, <h | E"), 

C,<I> 


U. 


■“ ( 
ad\ 




(r/rpros.) _ 

(r/rpros.) + l/N ’ 


(33) 


see equation (24). In equation (32), we use a prior distribution Po(c, <h) ~ exp(—ac — 6$) 
with Lagrange multipliers a, b that calibrate the average posterior values (c) and (<h) over all 
genes to our inference from aggregate data (see above). This choice generates a conservative 
inference of gene-specihc seascape parameters that reflects two statistical featnres of our data. 
First, gene data E explained by a seascape model with parameters (c, <h) and a nentral trait 
variance Pq (see text above and ref. [27]) have a similar likelihood in a family of models with 
parameters (Ac, <h) and nentral trait variance APq, where A > 0 is a rescaling factor, as long 
as the stabilizing strength is above some minimum value. In other words, there is a residual 
freedom in model parameters that leaves the htness hux <F invariant. This freedom exists 
because the gene-specihc diversity values A" are too noisy to be included in the inference. 
Onr prior distribution favors posterior values c close to the minimum stabilizing strength, 
which are consistent with the inference from aggregate data. Second, the distribution (30) 
has an algebraic tail, g(E | c, <h) ~ for 2A<F 3> 1, which is caused by the dihusive 
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dynamics of the fitness peak. Our prior distribution suppresses this tail and favors posterior 
values <h close to the maximum-likelihood value <h*. The validation of this inference scheme 
by simulation tests is described in section 5. 


Statistical significance of the inference. The probabilistic extension of the O plays an 
important role in our global inference: to quantify the statistical signihcance of our evidence 
for adaptive evolution under directional selection. Specihcally, we evaluate the cumulative 
log-likelihood score for all genes of our data set as a function of the evolutionary variables c 
and <h, 

S(c,<I>) = ^logO(c,<I>|E“), (34) 

q :=1 

where (^(c, <h|E“) is given by equation (32). This function is shown in Fig. 1C with its 
maximum shifted to 0. The global maximum-likelihood seascape model has parameters 

(c*,$*) = argmax^(c,$) = , (35) 

compared to (c^g, 0) = arg maxc S'(c, = 0) = (16,0) for the best landscape model and 

(c, <h) = (0, 0) for neutral evolution. We obtain log-likelihood differences S{c*, <F*)—5'(Cgq, 0) = 
8396 and S'(c*,<F*) — 5'(0,0) = 12464. By a log-likelihood test, these differences translate 
into the P values quoted in the main text. Maximum-likelihood values analogous to equation 
(35) can also be dehned for classes of genes (Table 1). 


3. Analysis of alternative evolutionary scenarios 

Lineage-specific demography. Demographic effects, such as population bottlenecks, af¬ 
fect the patterns of sequence variation in Drosophila [30,31,33,34,63]. Here we examine 
the effects of strong, long-term demographic heterogeneities on the divergence and diversity 
of expression levels. Specifically, we consider changes in effective population size to a value 
Ni = XN in a given Drosophila lineage i, which is dehned by the terminal branch of species 
i in the phylogeny and extends over a scaled evolutionary time (Fig. SlA). A depletion of 
effective population size leads to a global relaxation of stabilizing selection on gene expres¬ 
sion, given by a reduced stabilizing strength Ac in the htness seascape (equation 11). For 
each clade C with i G C, we dehne the polarized divergence-diversity ratio. 


n 


c,i — 


1 

1 ^ 


iec\Ci 


(36) 


where {Ci,C \ Ci) is the partioning of clade C dehned by its root node and we assume 
i G Cl. The pairwise ratios Qij are given by equation (9). Similarly, we dehne the polarized 
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divergence time, 


(37) 


|c\Ci| 

' ' iec\ci 

In Fig. S3A,B, we plot polarized data {Tc^i,flc,i) together with background data (rc,hlc) 
from partial clades excluding species i. Under a change of population size in lineage i with 
F ^ flstab(Ac), the polarized data should follow a pattern with reduced (A < 1) or increased 
(A > 1) long-term constraint, 


n(r, Ti) = Qeq{r,Ti) + Qs,d{T,Ti) (38) 

{ r for r < nstab(Ac) 

|(hlstab(Ac) -h flstab(c)) + ^VT + J'(A, c) for T > Tj flstab(c), 

where the shift J^(A, c) is generated by the demographic inhomegeneity on intermediate time 
scales; this pattern is shown in Fig. S3A,B for A = 1/2 and A = 3. A similar calculation shows 
that short-term population bottlenecks have a negligible effect on the statistics. We observe 
no deviation between polarized and background hi data, indicating the absence of strong 
demographic effects shaping the evolution of expression levels. Equation (38) also shows 
that demographic effects do not confound the hi test for adaptive evolution under directional 
selection. For time-independent optimal trait value (i; = 0), global relaxation of stabilizing 
selection increases the divergence-diversity ratio, as noted in previous studies [5,16, 17]; 
however, it does not generate the linear increase flad('r) ^ vt/2 characteristic of htness peak 
displacements (Fig. S3). 


Gene-specific relaxation of stabilizing selection. We can also test for lineage- and 
gene-specihc relaxation of stabilizing selection on gene expression, which arises, for example, 
from a partial loss of gene function. We model the loss dynamics by a stochastic process: 
with a small rate 7 , individual genes switch the stabilizing strength of their fitness seascape to 
a reduced value Ac (with A < 1). This dynamics increases cross-species divergence and gen¬ 
erates a nonlinear time-dependent fl(r), not observed in the data (Fig. lA). To discriminate 
between relaxed stabilizing selection and directional selection, we also use the distributions 
of clade-specihc expression differences which are dehned as averages over pairwise dif¬ 
ferences AU"- = in analogy to equation (10). The observed distributions are of 

approximately Gaussian form. 


Pc(AE) 


^/2nI)c 
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(39) 


as shown by the collapse plot of Fig. S4A. This is in accordance with the minimal seascape 
model, which predicts a Gaussian distribution Pt-{AE) with variance {D{t)). In contrast, 
stochastic relaxation of stabilizing selection generates broad non-Gaussian tails increasing 
with divergence time r that are not observed in the data (Fig. S4G). We conclude that relaxed 
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stabilized selection alone cannot the observed statistics of Drosophila gene expression levels. 
This does not exclude that relaxation of selection affects some genes in our data set and more 
broadly genes with complete loss of function, which are suppressed in the set of conserved 
orthologs. 

Punctuated directional selection. The Ornstein-Uhlenbeck dynamics of fitness peaks in 
the minimal seascape model (equation 28) describes the accumulation of small but continual 
changes of optimal expression levels. Larger peak shifts can be caused by discrete ecological 
events, including major migrations and speciations, and by gene-specific factors such as neo- 
functionalization [64]. Here we model such events by a punctuated fitness seascape [26]: with 
a small rate u/i/(2r^), individual genes are subject to fitness peak shifts by an amount of 
order Eq. This stochastic model differs from previous models of lineage-specihc selection [20, 
24,43-45,51,52], where htness peak shifts are constrained to known branch points of the 
phylogeny. Evolution in a punctuated htness seascape generates divergence-diversity ratio 
r2(r) of the form (equation 20); adaptation is signalled by the same term Had('r) — ur/2 as in 
a minimal seascape of the same driving rate v. To discriminate between the two models, we 
use again the distributions Pc{AE) of clade-specihc expression differences. In a punctuated 
seascape, these distributions have broad non-Gaussian tails increasing with divergence time 
Tc that are not observed in the data (Fig. S4D). We conclude that large peak shifts are a 
sub leading factor of expression changes in our data set. 

Other modes of adaptation. Further evolutionary modes affecting gene expression in¬ 
clude: 

(a) Time-dependent stabilizing selection [26]. This type of selection can be modeled by a 
htness seascape of the form (11) with time-dependent stabilizing strength c{t), given 
by a generalized Ornstein-Uhlenbeck process with constraint c{t) > C m,„ . The recurrent 
tightening of expression constraint driven by increases of c{t) is a mode of adaptation 
that is independent of htness peak changes. The O test does not trace this mode: as 
long as the expression optimum E* is time-independent, the function 0(r) reaches an 
asymptotic value < Ostab(cmin)- This pattern is similar to evolutionary equilibrium in a 
single-peak htness landscape and does not contain the term Oad('r) ~ ur/2 characteristic 
of htness peak displacements. 

(b) Adaptive gene turnover, including sub- and neo-functionalization after gene duplica¬ 
tion [65,66], regulatory sequence duplication [67], and de novo formation of genes [68]. 
This mode is suppressed in our data set of conserved orthologous genes, but it is likely 
to be more prevalent in the complementary set of Drosophila genes. 

A detailed investigation of these evolutionary modes is beyond the scope of this study. 
Importantly, however, they do not confound the inference of adaptation under directional 
selection reported here. 
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4. Analysis of specific gene classes 

Codon usage. The effective number of codons, n, measures the redundancy of the genetic 
code within a given gene [46]. This number takes values between 20 (each amino acid is 
determined by a specific codon) and 61 (all sense codons are used). Genes with specific codon 
usage (small n) tend to have higher expression than genes with broad codon usage [35,36]. 
Here we compute the species-averaged effective number of codons, n“ = ^ nf for all genes 
in our data set. We find a consistent dependence of expression adaptation on codon usage: 

(a) Aggregate analysis by the G test signals strongly reduced adaptation for genes with 
specific codon usage (n < 42) and an enhanced adaptation for genes with broad codon 
usage (n > 50), compared to the average over all genes (Fig. 2A and Table 1). 

(b) The G test also signals strongly reduced adaptation for genes with high average ex¬ 
pression level, E°‘ = > 0-9 (Table 1). Additionally, we compare the fitness 

flux of a gene to its codon adaptation index (CAI), which measures the similarity be¬ 
tween the codon usage in a specific gene and the codon preference of highly expressed 
genes [69]. Consistently, we find a reduced amount of fitness flux in genes with high 
codon adaptation index (CAI > 0.65); these genes are likely to be highly expressed. 

(c) At the level of individual genes, there is a clear correlation between fitness flux and 
effective codon number (Fig. 2B). 

Inference of adaptive sequence evolution. For the genes in our data set, we estimate 
the fractions of synonymous and non-synonymous polymorphic nucleotides (P* and Pn) from 
the database of Drosophila melanogaster Genetic Reference Panel (DGRP) [41]. The cor¬ 
responding nucleotide divergence measures {Dg and Dn) are obtained from sequence align¬ 
ments between the D. melanogaster and D. simulans reference genomes [42]. The McDonald- 
Kreitman test [37,57] signals adaptive evolution of amino acids if Oseq = {DnPsIDsPn) — 1 > 
0. Fig. S5 shows the distribution of Oseq values for classes of genes with different amount of 
expression adaptation, measured by the fitness flux <F (equation 33). We find no correlation 
between these statistics. In each class, about 30% of the genes have Oseq > 0. This result 
does not contradict the correlation of gene expression divergence and amino acid divergence 
reported in ref. [19], because an enhanced amino acid substitution rate measured by a Dn/Dg 
test [70] may be caused by adaptive changes or by relaxation of negative selection. 

Analysis of functional gene classes. We use The Ontologizer [71] for statistical analysis 
of functional enrichment in our dataset. From a base set of all 6332 genes in our database, 
we identify enriched functional categories in the query sets of adaptively regulated genes 
(2A<F“ > 4) and genes with sex-specific adaption of expression (2A<F"f > 4.5, see below). 
We use the calculation method Parent-Child-Union with Bonferroni correction and resam¬ 
pling steps of 1000. The enriched functional categories in these gene sets are reported in 
Tables SI and S2 with a significance threshold P < 0.1 (multiple hypothesis test). We list 
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three main categories: biological processes, cellular components, and molecular functions. 
Each functional category is assigned to a functional cluster (in bold letters) that is inferred 
by REVIGO [72], using the semantic similarity measure SimRel with threshold 0.5. This 
clustering facilitates the interpretation of functional gene classes associated with adaptation 
of gene expression. 


Sex-specific evolution and sex bias of expression. To quantify differences of gene 
expression between male and female individuals, we we dehne the sex specihcity trait of a 
given gene as the difference between its expression levels in males and in females [19], 


E 


Q. 

mf,2 


T^a jpa 


(40) 


We analyze these traits by the same methods as the sex-averaged expression levels E" dehned 
by equation (5). Specihcally, we dehne the aggregate divergence-diversity ratio Gmf,c and 
the htness hux in analogy to equations (10) and (15), and we infer gene-specihc 

maximum-likelihood values 2iV<l>"£ in analogy to equation (11). We dehne two conceptually 
distinct measures of male-female diherentiation: 


(a) Sex-specihc adaptation. In accordance with ref. [19], we hnd that most genes of our 
data set have well-conserved and often small sex specihcity; these genes evolve their 
expression levels coherently between males and females. We use the rescaled htness 
hux 2iV<l>jnf to delineate coherent evolution of expression levels (i.e., conservation of the 
specihcity trait) from sex-specihc adaptation (i.e., adaptive changes of the male-female 
expression diherence), as illustrated in Fig. 2C. A set of 1155 sex-specihc adaptive genes 
is identihed by the condition 2A^<h"£ > 4.5 (Table S2); we use a more stringent threshold 
than for <l>" because the sex-specihcity trait statistics has larger statistical errors. 

(b) Sex bias. We identify genes with male- and female-biased expression in Drosophila using 

the results of Assis et al. [39], which are based on a number of statistical tests in the whole 
body and in gonads of males and females in D. melanogaster and D. pseudoobscura. A 
gene is classihed as expression sex-biased if hagged by at least three of these tests, which 
produces a list of 450 male-biased and 499 female-biased genes. A related measure of 
bias within our data set is the species-averaged specihcity trait, | Yhi 

Our analysis establishes a relation between these two measures in our data set: strong sex- 
specihc adaptation of expression occurs in male-biased, but not in female-biased genes. First, 
the aggregate ratio Omf in male-biased genes show evidence for adaptive evolution with a 
linear adaptive component ur/2. Unbiased and female-biased genes have only a small average 
divergence in their sex-specihcity trait that is of the order of the expression diversity (i.e., 
they within the error range of the observed expression levels), providing no evidence for 
adaptation (Fig. 2D). Second, the htness hux is strongly enhanced for genes with large 
(Fig. 2E). Accordingly, 32% of male-biased genes are classihed as sex-specihc adaptive. 
Functional categories associated with sex-specihc adaptation of expression are reported in 
Table S2. 
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5. Simulation tests 


In-silico evolution of quantitative traits. We use a Fisher-Wright process for the evo¬ 
lution for the evolution of populations along the Drosophila phylogeny of Fig. SlA. A pop¬ 
ulation consists of N individuals with genomes a.^^\ ..., A genotype is an £-letter 

sequence a = (ai,...,a£) with alleles au = 0,1 {k = It defines an expression 

level E{sl) = with neutral variance \ Sfc=i uniform single-locus 

effects Si] our results are insensitive to the form of the effect distribution [26]. In each gener¬ 
ation, the sequences undergo point mutations with a probability fj,To per generation, where tq 
is the generation time. The sequences of next generation are then obtained by multinomial 
sampling with a probability proportional to [1 -|- ro/(i?(a), t)], where the fitness function 
f{E,t) is given by equation (11). Simulations are performed with N = 100, ttq = 0.1 for 
traits with i = 100, uniform effects Ei = 1, and average fitness optimum S = 70. We use 
three different types of selection (for details, see ref. [26]): 

(a) Minimal fitness seascape. Before each reproduction step, a new optimal trait value 
E*{t + To) is drawn from a Gaussian distribution with mean E*{t){l — /rrou/(2r^)) -|- 
S fXToV/{2r‘^) and variance htqvEI. 

(b) Fitness landscape. The optimal trait value E* is time-independent (Fig. S3A,B). In the 
model of gene-specific relaxation of selection (see section 3), the stabilizing strength of 
individual genes switches to a smaller value, c —)■ 0.01c, with a small rate 7 (Fig. S4C). 

(c) Punctuated fitness seascape (see section 3). Before each reproduction step, a new, un¬ 
correlated optimal trait value is drawn with probability /irou/(2r^) from a Gaussian 
distribution with variance r'^E^, where is a constant of order 1 (Fig. S4D). 

Validation of the probabilistic inference scheme. To test the performance of our 
inference scheme, we generate expression values E“ = {Ef ,..., E^) for individual genes with 
trait scales Eq ^ by Fisher-Wright simulations along the Drosophila phylogeny of Fig. SlA. We 
use minimal fitness seascapes of the form (11) with input parameters (cin, Um) and a sequence 
diversity ttq = ApN = 0.05. We then infer maximum-likelihood posterior values (c“,2V<F“) 
by the probabilistic method described in section 2 (equation 33). In Fig. S7A, we plot the 
distribution of inferred fitness flux values 2V<F" against the input expectation value 2V<Fin = 
2cinUin tdtos. (equatiou 16). The underlying simulations use a range of trait scales E^^ = 
0.25—4.0 appropriate for log expression levels; the inference of does not require knowledge 
of this scale (see section 2). Fig. S7B shows the corresponding distribution of inferred values 
c" as a function of the input stabilizing strength Cjn. These simulations use a uniform 
trait scale Eq^ = 1 (inferring the actual scales requires sufficiently reliable gene-specific 
expression diversity data). The posterior values (<F",c") are seen to provide reasonable, 
on average conservative estimates of the input model parameters (cjn, 4 * 111 ) • In particular, 
the inference of a significant fitness flux (4* > 1/2N) is incompatible with evolution under 
static stabilizing selection (u = 0, c > 0) or near neutrality (c ~ 1), independently of the 
underlying model for the adaptive evolution of a molecular trait. 



Robustness of the inference to trait epistasis. The analytical theory underlying 
our inference method [26, 27] covers molecular quantitative traits with a linear genotype- 
phenotype map, E{sl) = X]fc=i (see above). Here we extend this method to nonlinear 
traits of the form i7(a) = J2k=i^k(ik + J2k<k' ^kk'dkCi-k'] such nonlinearities are commonly 
referred to as trait epistasis. The strength of epistasis can be dehned as the ratio of nonlinear 
and linear neutral trait variance, = (X]fc<A:' 

Trait epistasis introduces only minor changes to the quantitative genetics theory of 
refs. [26,27]. In particular, the H ratio retains its normalization in the neutral long-term 
limit and the quasi-neutral growth of the trait divergence is still given by equation (21), 
where A is now the total genetic diversity of the trait. To specihcally test our inference 
method, we perform Fisher-Wright simulations as described above over a wide range of the 
parameter individual epistatic effects Skk' are drawn from a Gaussian distribution with 
mean 0. In an ensemble of 6000 independently evolving traits, we record both the actual 
average fitness flux (equation 15) and the inferred fitness flux determined from the aggregate 
G ratio (equation 24). Both quantities show no systematic dependence on (Fig. S7C), 
suggesting that our inference of adaptive evolution is not confounded by trait epistasis. 
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Figure SI: Species phylogeny and levels of gene expression variation. (A) Phylogeny 
of the Drosophila genus, as reconstructed in ref. [42] from synonymous sequence divergence. Six 
clades are marked by colored triangles, their ancestral nodes by colored circles. The table specifies 
the species contained each of the clades and the clade divergence time tc (equation 1). (B) Gene- 
averaged isogenic variance (6) (o, equation 4), expression diversity (A) (v, equation 6), male-female 
expression dimorphism (A^/) (o, equation 7), clade divergence (D) (A, equation 8), and cross-gene 
variance of expression V = (P?) « 1 (x). We find a clear ranking {6i) < (A) < (A^/) < {Dij) < Vi. 
The color code for single-species data is shown in the legend, colors for clades are as in (A). 
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Figure S2: Fitness landscape and Ornstein-Uhlenbeck models as control. (A) The 

aggregate divergence-diversity ratio for clades, Oc (filled squares), is compared to the ratio for 
individual pairs of species, (empty squares). Both quantities are plotted against the divergence 
time estimated from four-fold synonymous sites [42] (Fig. SI A). Clade-based statistics is seen to 
substantially reduce the noise of the expression divergence data. The seascape model (green line 
as in Fig. lA; c* = 18.6, v* = 0.07) provides a significantly better fit to these data than the 
landscape model (blue line; Ceq = 13); see Fig. 1C and section 2 of Materials and Methods for 
a statistical model comparison. (B) The same species-pair ratio lljj (empty squares) is plotted 
against the amino-acid sequence distance used in ref. [20], uniformly rescaled to give the same 
scaled genus divergence time roros. = 1-4 as in (A). Compared to synonymous divergence, amino- 
sequence divergence is seen to produce an inhomogeneous molecular clock that adds to the noise of 
the evolutionary analysis. For these data, the seascape model (green line as in (A)), the landscape 
model (dashed blue line; c = 6.4), and the Ornstein-Uhlenbeck model [43] (blue line; A = 1.8/i“^, 
= 0.13 (A)/A; see equation 28 and ref. [20]) produce fits of overall comparable quality. However, 
both equilibrium models cannot explain the evolution of expression in the youngest clades: the 
landscape model overestimates the divergence Dmei-yak and Dmei-sim, the Ornstein-Uhlenbeck 
model overestimates the relative divergence D^ei-yak/Dmeisim- See section 2 of Materials and 
Methods. 
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Figure S3: Test of lineage-specific demography. We compare the polarized divergence- 
diversity ratio Q,c,i with species i as outgroup (equation 36, A) to background data from par¬ 
tial clades excluding species i (v); both quantities are plotted against the clade divergence time. 
(A) Left panel: Data for clades with outgroup D. melanogaster. Center and right panels: Evolu¬ 
tion with a reduced or enhanced effective population size W in the outgroup lineage. Analytical 
curves and simulation results are shown for Ni = 3N (dashed lines, k) Ni = N/2 (dashed-dotted 
lines, A) in a fitness landscape (c = 20, v = 0; center panel) and seascape (c = 20, u = 0.09; right 
panel). (B) Same as (A), with outgroup D. mojavensis. (C) Data for each of the other five species 
chosen as outgroup. These data give no evidence of long-term lineage-specific demography. The 
analytical and simulation results show that lineage-specific demography under stabilizing selection 
does not give a spurious signal of adaptive evolution in the D test. Lineage-specific demography is 
introduced in section 3, simulation details are given in section 5 of Materials and Methods. 
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Figure S4: Test of alternative selection scenarios. (A) Distributions of clade-specific ex¬ 
pression level differences, Pc{AE) (equation 39, color code as in Fig. SIA), standard-normalized 
to mean 0 and variance 1. These distributions are approximately Gaussian (black line: stan¬ 
dard normal distribution). (B) Minimal seascape model. Top panel: Divergence-diversity ratio 
D(r) (bullets: simulation results; line: analytical curve as in Fig. lA). Bottom panel: Standard- 
normalized distributions of trait differences, Pr{AE), from simulations for r = 0.21,0.69 and 1.37 
(green, orange, and blue bullets) are of Gaussian form (dotted line). The same quantities are shown 
for two alternative fitness models: (C) Loss-of-function model. Functional genes evolve in a static 
fitness landscape of stabilizing strength c = 4.5; individual genes lose function with rate 7 = 0.04 /r, 
resulting in reduced selection (c —)• 0.01c). The loss events generate a nonlinearity in D(t) and a 
broad tail in Pr{AE) that are not observed in the data. (D) Punctuated fitness seascape. Individ¬ 
ual genes jump to a new, uncorrelated fitness peak with rate 0.16 /r. These dynamics also generate 
a broad tail in Pt-{AE). The Drosophila data of Dq (Fig. lA) and of Pc{AE) together favor the 
minimal seascape model over both alternatives. The loss-of-function model and the punctuated 
seascape model are introduced in section 3, simulation details are given in section 5 of Materials 
and Methods. 
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Figure S5: Adaptive gene expression versus adaptive evolution of protein seqnence. 

(a) The distribution of Oseq = {D^Ps/DsPn) — 1, denoting the fraction of adaptive amino acid 
substitutions [57], is plotted against the cumulative fitness flux of gene expression, 2Ad> (circle: 
average; line: median; box: 50% around median; bars: 70% around median). We find no correlation 
between these statistics, which suggests that adaptive gene expression is an independent mode of 
evolution. (B) The distribution of Oseq plotted against the average sex specificity Fmf signals 
increased adaptive protein evolution in genes with sex-biased expression, which is strongest in 
male-biased genes (cf. the results of ref. [19]). For the definition of sex-biased expression, see 
section 4 of Materials and Methods. 
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Figure S6: Rescaling of gene expression data. The clade-specific divergence-diversity statis¬ 
tics (riciT:) are shown for three variants of expression levels. (A) Raw data: log 2 intensities 
averaged over experimental replicates. (B) Levels shifted to gene average 0 within each replicate 
line. (C) Levels shifted to gene average 0 and normalized to variance 1 within each replicate 
line [50], as used throughout this paper. The theoretical curve ^(t) of the best-fit seascape model 
(c* = 18.6,2iV<h* = 3.6) is shown in all three panels. The additive shift in (B,C) is seen to be 
essential for evolutionary analysis, the subsequent multiplicative rescaling in (C) further reduces 
the noise in the data. See section 1 of Materials and Methods. 
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Figure S7: Simulation tests of the inference scheme. (A,B) Distributions of the cumulative 
fitness flux 2iV<h“ and stabilizing strength c“ inferred from simnlated expression data are plotted 
against the simulation input parameters 2Ad>in and Cjn (red line: median, box: 50% around the 
median, bar: 75% around median). See section 5 of Materials and Methods for simulation details. 
(C) Selection inference for epistatic traits. Simulation resnlts of the actnal fitness flux (A) are 
compared to flux values inferred by the standard D test (□, see section 2 of Methods). Both 
qnantities are plotted against the strength of epistasis, e^, defined as the ratio of epistatic and 
additive trait variance (section 5 of Methods); horizontal lines show the actual fitness flux without 
epistasis (e^ = 0). Simulations are shown for selection parameters (c = 4.5, u = 0.4) (green) and 
(c = 4.5, u = 0) (blue). See section 5 of Materials and Methods for simulation details. 
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Table 1: Selection parameters and amount of adaptation, c*; maximum-likelihood sta¬ 
bilizing strength. 2A^<1>*: maximum-likelihood htness flux, cj: clade-dependent adaptive fraction 
of the gene expression divergence, a: fraction of adaptively regulated genes across the Drosophila 
genus, given by the condition 2A^<1>“ > 4. See equations (33) and (35) for dehnitions; gene classes 
are introduced in section 4 of Materials and Methods. 
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